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1. INTRODUCTION 


In any cost-effective stratified sampling design, the optimal sample size and 
its allocation between the different strata depend on the within-stratum vari- 
ances, the stratum size, and the precision required for the estimate. With 
the development of an area sampling frame, strata sizes are known in terms of 
the total number of sampling units per stratum. The precision goal is fixed 
in advance and hence known. However, prior to the survey, no direct knowledge 
of within-stratum variances is available; therefore, it is necessary to esti- 
mate them. Usually, a pilot survey is conducted and, subsequently, the infor- 
mation resulting from the pilot study is utilized in planning a full-scale 
sample survey. In this report, a methodology for indirectly estimating stra- 
tum variances using existing agricultural statistics and other ancillary 
information is proposed and evaluated for the U.S. Great Plains (USGP). 

In most countries, crop statistics are computed annually either through com- 
plete enumeration or by employing sample survey methodology. However, the 
geographical level and the type of crop statistics reported vary considerably 
from one country to another. For example, reliable crop statistics for area, 
yield, and production are available in the United States at the county level. 
In contrast, crop statistics are not available for China at a political sub- 
division level lower than the country level. Canada, India, and several other 
countries provide fairly reliable annual crop statistics at a geographic level 
similar to the U.S. county. Yet, even among these countries, the type of crop 
statistics produced is varied; for example, in Australia, annual crop statis- 
tics contain no information on harvested acreage. Consequently, no fixed 
procedure can be applied to each and every country for determining the within- 
stratum variances. 

Initially, in the Large Area Crop Inventory Experiment (LACIE), a proportional 
sample allocation based primarily on historical wheat production was employed. 
That is, a fixed total sample size was allocated to the different countries of 
interest and to the smaller political subdivisions within each country so as 
to be proportional to the historical wheat production of the different 
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geographic subdivisions. In the later phases of LACIE, methods were devised 
to estimate the within-stratum variances by utilizing past Landsat imagery and 
other ancillary data. These estimates permitted a more nearly optimal sampl- 
ing allocation to be employed during the final phases of LACIE. 

During the first year of concentration in a crop/region, little to no previ- 
ously analyzed Landsat data are available for making within-stratum variance 
estimates; this will be the case in many crop/ regions of the Agriculture and 
Resources Inventory Surveys Through Aerospace Remote Sensing (AgRISTARS) pro- 
gram. Thus, a technique is needed for making initial within-stratum variance 
estimates without the use of previously analyzed Landsat data. The descrip- 
tion and the evaluation of such a technique are presented in this report. The 
technique is motivated by the empirical models employed by Perry and Hallum 
(ref. 1) in their study on sampling unit si zee Also discussed in this context 
are the methodologies employed during the LACIE to estimate the within-stratum 
variances for sample allocation in the crop survey program of the Earth Obser- 
vations Division (EOD), National Aeronautics and Space Administration (NASA), 
Lyndon B. Johnson Space Center (JSC). Other information included in this 
report are the following. The approaches adopted in LACIE Phases I, II, and 
III and in the Transition Year (TY) are described in section 2. Details of 
the proposed technique are given in section 3. Different variations of this 
procedure as applied to estimate refined-stratum variances for wheat in the 
USGP are given in section 4.1. [Refer to Chhikara (ref. 2) for details of the 
stratification considered in this study.] A discussion of the stratum- 
variance estimates obtained using the different methods is given in 
section 4.3. It is concluded in section 5 that if reliable historical crop 
acreages are available at a small political subdivision level (e.g. , county in 
the U.S.), then fairly good stratum-variance estimates can be obtained using 
the proposed method. 

The technique for making initial within-strafum variance estimates is designed 
to make optimal use of the available data (even if limited by its reliability) 
for estimating within-stratum variances on crop/ regions that otherwise would 
not be estimated because previously analyzed Landsat data are not available. 


2. PREVIOUS A PPROACHES 


2.1 LAC IE PHASES I AND II 

During Phases I and II of LACIE, the total sample size was determined primar- 
ily by engineering and resources constraints. However, sample survey metho- 
dology [the Neyman Optimum Allocation Formula (ref. 3)] shows that, if 
allocation of the total -sample to the different strata were made proportional 
to the respective product of stratum size and within-stratum standard devia- 
tion, the resulting crop estimate should have a minimum variance for a fixed 
overall sample size. Thus, for a cost effective design, knowledge of within- 
stratum variances is required. 

In order to estimate the within-stratum variances used as input into the 
Neyman allocation formula, the binomial model was assumed where the sampling 
unit had dimensions of 5- by 6-nautical miles (a segment). That is, if p is 
the crop (wheat/small-grains) proportion for a stratum, then p(l - p) is a 
rough estimate of the between- segment crop proportion variance for the stra- 
tum. That this model overestimates the within-stratum variance for all strata 
was recognized because the model assumes that every segment is entirely wheat 
or nonwheat, which is far from reality even in the new lands of the Union of 
Soviet Socialist Republics (U.S.S.R.). However, it was considered reasonable 
to assume that these estimates reflected the relative magnitudes among the 
true within-stratum variances. Hence, it was thought that the total sample 
wac utilized in a cost-effective manner. It was recognized that an optimal 
overall sample size could not be determined using a binomial model because of 
considerable positive bias in the variance estimates produced by the model. 

2.2 LACIE PHASE III 

For this period, greater emphasis was placed on achieving a more accurate crop 
acreage and production estimates. As a result, a decision was made to 
reallocate the sample segments in the USGP for LACIE Phase III. Among other 
factors, this decision was based on the desirability of having more reliable 
within-stratum variance estimates as input variables to the allocation formula 


than could be obtained from the binomial model* It was noted that the sample 
units were large and could be expected to contain some nonagri cultural areas. 
Also, it was envisioned that if the segment crop area were related to the 
segment agricultural area, then this statistical relationship cou'd be 
exploited to produce an improved with in- stratum variance estimation proce- 
dure. This, in fact, proved to be the case. The resulting within-stratum 
variance estimation technique was derived using the following approach. 

The crop proportion in a sample segment was expressed as 

p = ra (1) 

where 

p = the proportion of crop acreage in a segment 

r = the ratio of crop acreage to agriculture acreage in a segment 

a = the proportion of agricultural acreage in a segment 

It was assumed that the ratio r did not depend on the proportion of agricul- 
tural acreage in a segment. Then, the variance of p was easily computed using 
the formula for the variance of the product of two independent random 
variables. For each stratum, this yielded the following formula. 

°l 4 [ e2 < 3 > + “a] * 4 E2 < r > < 2 > 

The mean and variance of the proportion of agricultural acreage, E(a) and 
2 

a , respectively, were obtained directly from estimates of the proportion of 

a 

agricultural land in each segment in a stratum. The available Landsat imagery 
was used for this determination. However, it was not feasible to obtain 
directly such information for the variable r. Instead, the mean and variance, 
E(r) and o , were estimated for each stratum as follows: E(r) was estimated 

by 


r = Historical crop acreage for stratum h 
r h ~ Landsat agricultural acreage for stratum h 


( 3 ) 
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(4) 


and cr by 


®5 



where K = 0.03 

The value of K in equation (4) was based on an empirical study for small 
grains where the mean and variance of r were computed from segment data 
obtained from Landsat imagery for 40 counties in the USGP. These counties 
were considered as strata. Then the stratum variance was modeled by 

= Kr(l - r) (5) 

where r was the mean ratio of crop acreage to agricultural acreage in the 
stratum. A least-squares fit of this model resulted in K = 0.03. The adjust- 
ment, K, to the binomial variance, r^(l - r^), reflected the departure from 
the assumption that the ratio of crop acreage to agricultural acreage in a 
segment was 0 or 1. Thus, the determination of o £ from equation (4) could 
only be regarded as approximate and tenuous. Accordingly, the resulting stra- 
tum variance estimate was 

s p “ °- 03 r h' 1 - r h> & + s a> + s a r l < 6) 

_ o 

where a and S were the mean and variance of the proportion of agricultural 
acreage in a segment, respectively, and where this proportion was determined 

p 

by using Landsat imagery for the stratum. The properties of S could not be 

r 

determined for several reasons. The most obvious reasons were the empirical 
nature of the derivation and the historical nature of the input data. Never- 
theless, for initial within-stratum variance estimates, this model was 
expected to be an improvement over the binomial model considered in Phases I 
and II of LAC IE. 
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2.3 TRANSITION YEAR (TY) 


The method of computing initial stratum variance estimates for use in the TY 
project was influenced by two developments. First, a geographical stratifica- 
tion based on agrophysical characteristics had been developed for the TY sam- 
pling design (ref. 4). Second, sample data from LACIE Phase II in the form of 
segment wheat and small-grains proportion estimates were available-:- for use in 
direct estimation of the stratum variances. Although these sample data did 
not constitute a random sample relative to the new stratification, it was 
generally assumed that estimates based on these data would be more reliable 
than those obtained by using the earlier indirect methods. However, for some 
strata, sufficient segment data needed for directly estimating the stratum 
variance were not available. When this occurred, the stratum variance was 
estimated indirectly by employing the approach used in LACIE Phase III. The 
nonrepresentative nature of the sample data used in the direct estimates and 
the use of two altogether different methods of estimation could have led.to 
inconsistencies among the stratum variance estimates. If true, this would 
have adversely affected the associated sample allocation. 

An evaluation of the TY sample allocation was performed using the LACIE 
Phase III sample segment estimates. Phase III segment estimates were used 
because they were available and were regarded as more reliable than those from 
Phase II. The evaluation indicated an underallocation of sample segments to 
some strata and an overallocation of sample segments to other strata. For 
further details, refer to Chhikara (ref. 2). However, in reference 2, the 
effect of the nonrepresentative nature of the LACIE Phase III segment data 
with respect to the TY strata v/as not considered. 

For sample allocations in the future program of AgRISTARS, it would be ideal 
to have reliable and representative Landsat segment estimates in order to make 
direct initial estimates of the stratum variances. However, it is not expec- 
ted that initially such data will be available for most countries of interest. 
Accordingly, some indirectly derived stratum- variance estimates will need to 
be determined for the purpose of making a sample allocation. The approach 
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used for LACIE Phase III seems reasonable and feasible except for the deter- 
mination of the variance of the ratio of crop acreage to agricultural acreage. 
A new procedure for obtaining initial stratum crop proportion variances is 

offered and described in section 3. The procedure is equally applicable to 

2 

estimating the stratum variance a . 


3. PRESENT METHODOLOGY 


A procedure for indirectly estimating the stratum variances used in an initial 
allocation is presented. There are three basic underlying ideas. First, 
obtain estimates of the stratum variance for a set of sampling unit sizes 
including both large and small size sampling units; second, establish 
empirically a relationship between the sampling unit size and the stratum 
variance; and third use the empirical model to obtain an estimate of the 
stratum variance for the desired sampling unit size which is a segment. 


In the context of crop estimation, Smith (ref. 5) and Mahal onobis (ref. 6), 
independently of each other, proposed that the stratum between-units variance 
could be modeled as a power function of the sampling unit size. Histori- 
cally* a number of empirical studies [Smith, Mahal onobis, Jessen, Hansen 
et al., and Asthana (refs. 5, 6, 7, 8, and 9, respectively)] strongly indicate 
that the power function provides a simple, yet satisfactory, mathematical 
model for the functional dependence of the stratum between-units variair, on 
the sampling unit size. The first application of this functional form 
specifically to the between-units crop proportion variance was made by P. C. 
.Mahal onobis (ref. 6) in his 1938 study of jute production for Bengal (India). 
He considered the following function for the stratum between-units crop pro- 
portion variance. 


x (bx)9 


( 7 ) 


where p is the stratum crop proportion and x is the sampling unit size. The 
sample sizes considered in this study were 1, 2.25, 4, 6.25, and 9 acres. 

V' 

The rationale behind the variance formulation in equation (7) is as follows: 
when x = 1/b, the variance = p (1 - p) and 1/b represents the largest area 
(e.g., crop field) for which the crop proportion is either 0 or 1. As x 
increases in size away from 1/b, the denominator in equation (7) increases and 

O'. - 

a x decreases with V (1 - P) as an upper bound. If it is assumed that fields in 
a stratum are not mixed and all fields are approximately of equal size, the 
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difference between the average field size and the sampling unit size being 
considered should be indicative of the decrease in a from P (1 - P); a 

9 * 

smaller decrease in a is expected with a smaller difference between the 

x 2 

1 sampling unit size and 1/b. Consequently, the bias in estimating by 

I I 5 (1 - I 5 ) will be smaller for the smaller size sampling unit, and it is zero 

when the sampling unit size is less than or equal to 1/b. 

This same model was employed by Perry and Hall urn (ref. 1) in their sampling 
unit size study. Their study was based on the LAC IE Phase III ground- truth 
data set and concluded that indeed the power function does provide a 
satisfactory model for the between-units wheat acreage (or proportion) 
variance for sampling unit sizes ranging from 171 to 25 426 acres. Several 
other studies, particularly those by Jessen (ref. 7) and Asthana (ref. 9), 

1 show this general relationship to hold reasonably well even for very large 

areal units, a county for example. 

The relationship in equation (7) can be rewritten as 

a* = ax 0 (8) 

where 

I x = the sampling unit size 

2 

a = the stratum crop proportion variance corresponding to x 

A 

j and a and 0 are parameters to be empirically determined for each stratum. 

In developing this model for the different strata, it would be ideal to have 
2 

knowledge of a over a wide range of sampling unit sizes, x. For most coun- 

j A 

1 tries, this is not feasible because it would require expensive sampling or 

; complete enumeration to be performed, thus defeating the purpose of employing 

the model in the first place. Therefore, one is led in least-squares estima- 
\ tion of the stratum parameters a and 0 to choose sampling unit sizes for 

i 2 

which a x can be estimated directly from existing agricultural statistics or 
can be mathematically modeled and then estimated from existing agricultural 
statistics. 

e 
f 

«l 
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In the U.S., crop statistics are avaiiable at the county level and a strataum 
normally consists of many counties. Thus, the between-counties variance can 
be easily computed and used as an estimate of stratum variance corresponding 
to a sampling unit approximately equal to the average county size. However, 
since the counties often vary considerably in size, the stratum variance 
should vary statistically as the sampling unit size varies from the smallest 
to the largest county. This statistical variability may be preserved by using 

O 

a one-point estimate of cr for each county in the stratum. The one-point 
estimates are obtained as follows. Consider the county as a sampling unit 

where 

x-j = the size of the i tfl county in a stratum 

p^ = the proportion of crop acreage for the i^ county in the stratum 
p = the proportion of crop acreage in the stratum 

Then the squared deviation 

4 . - ( Pi - p) 2 (9) 

2 

provides an estimate of a for the sampling unit size x-j. Although these 

x i 

county level estimates can be expected to provide guidance in estimating the 
stratum variance for a sampling unit approximately the size of a county, they 
alone can not be expected to be sufficient to predict the stratum variance for 
a sampling unit of the size of a LACIE segment since it will be outside the 
sampling unit size range for the counties. 

The next three estimates are developed for use with small sampling unit sizes. 
Any one of these estimates along with the one-point variance estimates from 
equation (9) is used for the least-squares estimation of the parameters a and 
3. The resulting regression curve is evaluated for the sampling unit size of 
interest (segment) to obtain the corresponding stratum variance estimate* 
Later, it will be observed empirically that the last two relationships provide 
fairly reliable stratum variance estimates. 
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First, suppose that all fields are of the same size and shape and the sampling 
unit is randomly placed with the exception that it intersects only one field. 
Then the stratum variance corresponding to the field size, xg, is given by the 
binomial variance 

= "(I - ir) (10) 

X 0 


where tt is the proportion of the fields belonging to the crop type of inter- 
est. For a fixed crop proportion p and a fixed sampling unit siie, the 
between-units variance is maximized when the sampling unit proportions are all 
either 0 or 1. Thus, equation (10) provides an upper bound of p(l - p) for 
the stratum variance regardless of the sampling unit size. This feature and 
the method, in general, are illustrated in figure 3.1. 


Second, in a Lansat type sampling process, the sampling unit is randomly 
located and is expected to intersect more than one field. Thus, a closer 
approximation to a than that given in equation (10) is desirable. An exact 

0 2 

determination of the variance a is not feasible. However, a realistic 

0 

approximation is developed in appendix A under the following assumptions: (1) 

all fields are square and equal in size to the sampling unit size, xg, (2) the 
contents of any four adjacent fields are uncorrelated with respect to the crop 
of interest, and (3) the sampling unit is randomly placed with the exception 
that its sides are parallel to the field boundaries. The resulting estimate 
is given by 


4 - 1 pa - p> (id 

*0 

where p is the stratum crop proportion. 

Third, when the sampling unit size Xg is small relative to the size of the 
fields, then it is possible to derive the variance in a somewhat exact form as 
described in appendix B. In this case, the estimate corresponding to the 


3-4 


[Eq. (10)] 

— ~ Upper bound 



Figure 3-1.- An illustration of the fitted model. 


small sampling unit Xq, referred to as a pixel, is approximated by the 
equation 

oj = aj(l - p) + a 2 p 2 f a 3 (0.3682 - p + p 2 ) (12) 

where a^, a 2 , and a 3 are defined and evaluated in terms of the crop proportion 
and the field size distribution. 

As outlined earlier, equation (9) combined with any one of the equations (10), 

(11) , or (12) provide stratum-variance estimates over widely separated sampl- 
ing unit sizes from which the parameters a and 3 can be determined using a 
least-squares fit. An estimate of the stratum variance corresponding to a 
specified sample unit size, x, is then obtained by evaluating along the fitted 
curve 

= AX 8 (13) 

where A and B are the least-squares estimates of the parameters a and 3. 

It will be seen from the numerical results that use of both equations (11) and 

(12) lead to fairly reliable segment level variance estimates. Yet, 
equation (11) is probably preferable if accurate determination of the field 
sizes can be made or if the field sizes are large. Otherwise, it is probably 
better to use equation (12) since it should be less sensitive to error in the 
field size measurements. 

Other estimates of the within-stratum variances can be developed by, first, 

2 

using one of the above methods to estimate a followed by the application of 

2 ' 

equation (2) to estimate a . However, this type of substitution will likely 

P 9 

result in less reliable estimates unless the proposed method estimates a" 

2 r 

significantly better than a . ,, 
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4. VARIANCE ESTIMATION FOR WHEAT IN THE USGP 


4.1 WITHIN-STRATUM VARIANCE ESTIMATION METHODS 


Described in this section and evaluated in section 4.3 are the within-stratum 
variance estimation methods derived from the methodology discussed in sec- 
tion 3. Different methods are created not only by combining the county size 
units with the field or smaller size units but also by combining the type of 

least-squares fit used with either a direct estimation of a or an indirect 

2 2 P 

estimation of a ^ by way of a r . The three combinations of the sampling unit 

sizes for the stratum variance estimation are considered in the evaluation: 
field, equations (9) and (10); field, equations (9) and (11); pixel, equa- 
tions (9) and (12). The least-squares fit is approached in three different 
ways: (1) transform the data into logarithmic scale and then minimize the sum 

of squared deviations; (2) minimize the absolute difference between the aggre- 
gated variance resulting from the use of the model equation and the aggregated 
squared deviations obtained using equation (9); and (3) minimize the sum of 
squared deviations of variances given by the model from tho?.e resulting from 
the use of equation (9). In each case, the curve 3^ = Ax B is passed through 
the point (x Q , a x ). The different criteria are listed in table 4-1 where, of 

0 9 0 

course, A is replaced by cr" /x n and the summation s is understood to be taken 

x 0 u i 

over all the counties in a stratum. 


There are 2 x 3 x 3 =•• 18 combinations between the type of variance cr^ or a^, 
the type of small sampling unit [equations (10), (11), or (12)], and the type 
of estimation criterion that can be tried for empirical model development. As 
the computations were made and as the results were evaluated, it was dis- 
covered that the introduction of variable r led to less accurate variance 
estimates than when only the variable p was used. In addition, criterion C-3 
in table 4-1 appeared to yield more accurate estimates than the other two cri- 
teria. Consequently, no further combinations involving the variances or 
the criterion C-l or C-2 were given consideration. This action resulted in 
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only 8 of the 18 combinations actually being studied. Each of these combina- 
tions is designated as a variance estimation method and is listed in 
table 4-2. 

4.2 DATA INPUT 

The wheat acreages given in the 1974 Agricultural Census Reports were used in 
computing the crop proportion data and in computing the ratios of crop acreage 
to agricultural acreages for both counties and refined strata. The agricul- 
tura 1 acreages utilized in the computations came from a complete enumeration 
of the 5- by 6-nautical-mile segments in the USGP. In this enumeration, 
Landsat full-frame imagery wa.'i used to classify each segment as either 0- to 
5-, 5- to 10-,***, or 95- to 100-percent agricultural land. The segments 
with 5-percent or more agricultural land were designated as agricultural 
segments and were used in the computation of county and stratum sizes. The 
number of agricultural segments in a region is called its pseudo count (PC) 
and was taken from the LAC I E sampling frame. 

The average field size (more precisely the distribution of field size) varies 
from strata to strata and was difficult to determine. The following techn- 
ique, employing 1974 Agriculture Census Reports data, was used to estimate the 
average field size for a given stratum. Suppose N-j and A-j, respectively, are 
the number of operators and the 1974 crop acreage for the i^ crop in a stra- 
tum. Then, average field size, fg, for the stratum is estimated by 



where k is the number of major crops in the stratum. The field size estimates 
resulting from this computation are listed in column 7 of table 4-3. 

4.3 EVALUATION OF VARIANCE ESTIMATES 

The stratum variances were estimated for the USGP by each method listed in 
table 4-2, and the results were compared with estimates based on the TY sample 
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Method 


1 


TABLE 4-2.- VARIANCE ESTIMATION METHODS 


Variable 


Samp! ing 

unit combination 

Minimization 

criterion 

County and field, 
equation (10) 

C-l 

County and field, 
equation (10) 

C-2 

County and pixel, 
equation (12) 

C-3 

County and field, 
equation (10) 

C-l 

County and field, 
equation (10) 

C-2 

County and field, 
equation (10) 

i 

C-3 

County and pixel, 
equation (12) 

i 

C-3 

County and field, 
equation (11) 

C-3 































































































segment data. Comparisons were made not only against stratum variance esti- 
mates computed from the Classification and Mensuration Subsystem (CAMS) seg- 
ment wheat proportion estimates but also against estimates computed from 
actual segment wheat proportions for the blind sites. Listed in table 4-4 are 
these two sets of TY stratum variance estimates. Only refined strata with two 
or more available CAMS segment proportion estimates are listed. Mot listed 
are eight strata, three of which had one segment. 


Suppose Sjk is the estimated standard deviation for the stratum using the 
method, and a. is the TY standard deviation estimate for the j*"* 1 stratum. 
Consider the two cases for o. (either CAMS or blind sites) and compute the set 

J 


of differences, {(S^ 


a.)}, for each method and both cases. The mean and 

J 


variance of each set of differences are then easily computed. Assuming the 
difference to be an estimate of the error in estimating the within-stratum 
variance by a method, then they (i.e., mean and variance for the difference) 
provide an estimate of the possible bias and the variance expected in estimat- 
ing a stratum variance using this method. Listed in table 4-5 are the esti- 
mated bias and variance for each method as measured against both CAMS and 
blind site standard deviations. In both cases, bias estimates are consist- 
ently positive for all methods. Except for method 7, these estimates are sig- 
nificantly different from zero; with the possible exception of method 7, this 
approach is likely to overestimate the stratum variance. 


Both the bias and the variance estimates are consistently higher for vari- 
able r than for the variable p as observed by a comparison of methods 1, 2, 
and 3 with methods 4, 5, and 7, respectively. As a result, no further consid- 
eration of computing stratum variances wa s given to combinations involving the 
variable r. For example, combinations of the sampling unit and minimization 
criterion corresponding to methods 6 and 8 were not tried for the variable r. 
Next, parameter estimation criterion C-l (method 4) resulted in higher mean 
square error estimates than criterion C-3 (method 6). Although criteria C-2 
and C-3 competed well in this respect (e.g., the mean square error for method 
5 versus that for method 6), it is preferable to choose criterion C-3 rather 
than C-2 because C-3 gives consideration to the variation in county sizes 
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TABLE 4-4.- REFINED STRATUM VARIANCE ESTIMATES USING TY DATA 


CAMS segment estimates 


Ground-truth proportions 
for blind sites 


State Refined 


Number of 
segments 

Average 

wheat 

proportion 

3 

0.143 

21 

.140 

10 

.351 

7 

.302 

10 

.294 

23 

.213 

21 

.255 

7 

.035 

11 

.040 

3 

.284 

2 

.300 

7 

.026 

7 

.031 

8 

.120 

7 

.211 

6 

.273 

7 

.129 

6 

.245 

14 

.056 


Standard Number of 


0.090 

.138 


Average 

wheat 

proportion 

Standard 

deviation 

0.095 

0.064 

.333 

.074 

.339 

.080 

.355 

.040 

.232 

.078 

.297 

.157 

.028 

.001 

.051 

.055 

.338 

.127 

.026 

.014 

.097 

■9 

.159 

■s 

.259 

.108 

.105 

.059 

.159 

.088 

.063 

.024 

.195 

.272 

.091 

.040 

.051 

.073 




TABLE 4-5.- THE ESTIMATED BIAS AND VARIANCES IN ESTIMATING 

STRATA VARIANCES 


Method 

Blind site 
ground truth 

CAMS segment 
estimates 

Bias 

estimate 

Variance 

estimate 

Bias 

estimate 

Variance 

estimate 

1 

0.0379 

0.00337 

0.0274 

0.00148 

2 

.0585 

.00397 

.0477 

.00204 

3 

.0307 

.00278 

.0195 

.00140 

4 

.0432 

*00256 

.0359 

.00253 

5 

.0348 

.00295 

.0215 

.00162 

6 

.0494 

.00219 

.0350 

.00150 

7 

.0134 N* 

.00200 

.0013 N* 

.00123 

8 

.0239 

.00200 

.0110 

• 

O 

o 

o 

UD 


Symbol definition: 


CAMS = Classification and Mensuration Subsystem 

N* = Insignificant bias when the 5-percent 
significance t-test is used 


* 
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that is ignored in C-2. Thus, the crop proportion, p, is the variable of 
choice, and the minimization criterion is C-3. i 

It should be noted that bias and variance estimates were consistently higher 
for blind site data than for CAMS data. For variance estimates, this was per- 
haps due to a much smaller number of blind sites than the number of acquired 
segments for which CAMS estimates were available. However, higher numbers for 
the bias estimates reflect that stratum variance estimates were on the average 
closer to those obtained from the CAMS segment estimates than to those using 
ground-truth proportions. This implies that the proposed approach is more 
likely to estimate the total error (i.e. , sampling and classification com- 
bined) variance than the sampling error variance. Though desirable, this 
result is somewhat intriguing since no consideration was given to the clas- 
sification variance while developing this methodology. 

The stratum variance estimates produced by this methodology are further influ- 
enced by the sampling unit size, Xq, (either field or pixel) used in develop- 
ing the modeled variance a ^ . The situation is graphically illustrated in 

x 0 

figure 3-1 in section 3. A comparison of the numerical results for methods 6, 
7, and 8 shows that the most accurate variance estimates are obtained using 

O 

the pixel variance model [i.e., equation (12) for o~ ]. This result was some- 

x 0 

what surprising since better variance estimates were expected from the use of 

9 

field variance model [i.e., equation (11) for c~; ] and it may have been due to 

x 0 

the sensitivity of method 8 to the poor field size estimates used in the eval- 
uation. The field size estimates computed from the ratio of crop acreages to 
farm operators were on the average four times larger than field size estimates 
computed from a limited set of ground truth given by Pitts and Badhwar 
(ref.. 10). Note that a farm operator (accounted for by crop type) may have 
more than one field of a given crop type, hence, the average field size can be 
expected to be smaller than the value estimated using equation (14). The 
numerical results tend to confirm thill. Regardless of the method used, the 
stratum field sizes must be determined and the best possible information 
should be used for the evaluation. If data on crop statistics and cropping 


ax 
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practices from which the field size, fg, can be estimated is unavailable, then 
Landsat imagery can be employed to obtain an estimate of average field size 
for a stratum. 

To examine the effect of field size on the stratum variance estimates, similar 
computations were made using method 6 corresponding to reduced field sizes of 
0.5fg, 0.25f 0 , 0. lfg , 0.05fg, and the average field size from Pitts-Badhwar 
data. The estimated bias and variance resulting from these calculations are 
listed in table 4-6. From the table, it is noted that bias estimates 
decreased by two and one-half times as the field was reduced to 5 percent of 
its original size. Yet, variance estimates show no major change. The case of 
Pitts-Badhwar corresponds to using a constant value of 0.25fg for the field 
size in all strata. The reduction in bias associated with field size reduc- 
tion can be taken as numerical confirmation of the fact that the actual size 
of sample units having crop proportions either 0 or 1 is substantially smaller 
than the stratum field size, fg. 

From the derivation of equation (12) given in appendix B, it is observed that 

2 

an adjustment is made to the variance a for the proportions of small squares 

x 0 

(pixels) in the strata that are mixed. And, the proportion of mixed squares 

is a function not only of the stratum crop proportion, , but also of the stratum 

field size. Yet, when a field size of 0.25fg was substituted for fg in 

method 7, no change in the variance from the value reported in table 4-5 was 

observed although a slight reduction in the bias v/as observed, 0.0009 versus 

0.00013. Similarly, the relationship of equation (10) to equation (11) is 

2 

that of making an adjustment to the variance a for a sampling unit equal to 

x 0 

the size of an average field to account for the fact that such a sampling unit 

\\ 

is expected to contain both crop and noncrop acreage. Since the adjustment 
factor from equation (10) to equation (11) is a constant multiplier of 4/9, 
the primary improvement of equation (11) over equation (10) is to reduce the 
bias. Note in table 4-5 that the bias is considerably less for method 8 in 
both cases although the reduction in variance is only from 0.00150 to 0.00109 
in the case of the CAMS comparison and from 0.00219 to 0.00200 in the case of 
the ground-truth comparison. 


TABLE 4-6.- ESTIMATED 9 BIAS AND VARIANCE FOR REDUCED 
FIELD SIZE FOR METHOD 6 


Field size 

Bias 

estimate 

Variance 

estimate 

x 0 

0.0350 

0.00150 

0 • 5xq 

.0334 

.00192 

0.25xq 

.0231 

.00137 

O 

X 

o 

rH 

o 

.0176 

.00133 

0.05Xg 

.0143 

.00131 

Pitts- 
Badhwar 
(Average 
field size) 

.0231 

.00142 


Computed in the case of TY CAMS segment 
estimates. 







Listed in table 4-7 are individual stratum standard deviation estimates 
obtained for methods 7 and 8. The coefficient values of A and B are also 
given. The comparison between the two sets of estimates shows that, with only 
four exceptions, the method 8 stratum variance estimates are larger. This 
result is expected of the methodology, as discussed previously. In addition, 
an examination of A and B values across the strata suggests that A is signifi- 
cantly influenced by the stratum crop proportion and B is highly dependent 
upon the between-county variance. (See table 4-3 for information on the stra- 
tum crop proportion and the between-county variance.) This indicates that 
there is a positive correlation between the crop proportion and the value of 
A, as well as between the value of B and the between-county variance. The 
correlation is exhibited more in the case of method 7 than in the other 
method. 

It should be noted that the parameter B takes on values between -1 and 0 . 

When the largest area with crop proportion near 0 or 1 is considered for the 
sampling unit, the intraclass correlation is near 1 and the stratum variance is 
close to the binomial form and almost equal to A; therefore, B = 0. On the 
other hand, if the sampling unit is chosen to be a large cluster made of ran- 
domly selected elements, the interclass correlation is zero and the stratum 
variance is equal to A/x, where x is the sampling unit size; therefore, B = -1. 
An intuitive understanding of the observed dependence of B on the between- 
county variance component is given as follows. Since a smaller between-county 
variance component is indicative of a possible larger within-county variance 
component and thus a lower intraclass correlation, it follows that a smaller 
value for B may be expected when the between-county variance is small. 
































5. CONCLUSION AMD SUMMARY 


The present study considers several stratum-variance estimation techniques and 
proposes a new method to obtain initial variance estimates for sample alloca- 
tions in designing crop surveys. The approach is to develop empirically a 
relationship between the stratum variance and the sampling unit size. 

A procedure is devised that uses existing and easily available information of 
historical crop statistics in developing this relationship. Consideration is 
given to the field size in order to effect a modification in stratum variance 
that is necessary for small sampling unit sizes. 

Variance estimation is approached in two ways: (1) estimate the stratum vari- 
ance for crop proportion directly by developing the empirical model, and (2) 
first, estimate the stratum variance for the crop to agricultural acreage 
ratio by developing the empirical model, and then combine this variance 
estimate with the stratum mean and variance for the agricultural acreage. 

The numerical results indicated that the first approach should be preferred 
because it led to more accurate estimates (when compared with variance esti- 
mates obtained from segment data for wheat in USGP) than did the second 
approach. 

In addition, the numerical results tend to show that methods 7 arJ 8 perform 
about equally well and that either method produces realistic stratum variance 
estimates, given reliable input data. However, method 8 is probably more sen- 
sitive to the field size variable and should be used if accurate field size 
determinations can be made. Otherwise method 7 is preferable. 

In summary, the study suggests that (1) the technique is viable, (2) care 
should be exercised to insure the reliability of the input data, and (3) the 
field sizes must be realistically estimated either from historical statistics 
or Landsat imagery. 
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APPENDIX A 

WITHIN-STRATUM VARIANCE FOR FIELD SIZE 
SAMPLING UNIT 
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WITH IN-STRATUM VARIANCE FOR FIELD SIZE SAMPLING UNIT 

Let fg be the acreage field size. Suppose a stratum is divided into square 
units, each equal to the average field size. In general, a randomly placed 
sample element consist of areas from four different square units as shown 
in figure A-l . When the field boundaries are aligned with the grid coordi- 
nates and the units are assumed to be independent for the crop of interest, 
the field crop acreage is given by 

4 

A = E a. A. 
i=l- 1 1 

where 
A-j = XY 

A 2 = (1 - X)Y 

A 3 = (1 - X)(l - Y) 

A 4 = X(1 - Y) I 

X ~ u(0,l) and Y ~ u(0,l) are two stochastically independent uniform random 
variables, and the random variables a. are defined by 

1, Prob[a.j = 1] = P 
0, Prob[a.j = 0] = 1 - P 




■ ' 5 “v 

(£*■) 


= PE 
= P 


Var(A) = n|var^a i A i iA i 'sjj + Var^E ^a.A. |A i 

* e| £a 2 Var(a.)J + Var^ A.E(a.) j 

* P(1 - P) p E(A?) + P Var^A^ 


= 4P(1 - P)E(A^) + 0 


sines ^ c/(«2\ _ ^r/n2 


i=l E (Ap = 4E(A^) due to symmetry. 


Next 

E(A 2 ) = E(xV) 

= CE(X 2 )][E(Y 2 )] 



1 

9 


Thus 

Var(A) = I P(1 - P) 
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APPENDIX B 


WITH IN-STRATUM VARIANCE FOR A VERY SMALL SAMPLING UNIT (PIXEL) 

Developed in this appendix is a statistical model for the within-stratum 
variance for sampling units, which are very small relative to the field size 
of the crop of interest. Crop X will refer to the crop of interest. The 
model is developed using the definitions and assumptions in the following 
conceptual experiment. 

A square area unit with diagonal 2d is randomly selected from the area of a 
stratum having a proportion p for crop X. A random variable P is defined 
over the sample space of the experiment as follows. P has value p if the 
randomly selected square has proportion p for crop X. Probabilities a.| , a 2 , 
and otg are associated, respectively, with the following events: the square 
selected is pure and contains only crop X; the square selected is pure and 
does not contain crop X; and the square selected is mixed. With this nota- 
tion, it is observed that 

otj = Prob(P = 1 ) 
a 2 = Prob(P = 0} 
aj = Prob(0 < P < 1) 

ot 7 + a 2 + ct 3 = 1 

E(P) = p 

Var(P) = cijO - d) 2 + a 2 p 2 + a 3 E p|o<p<]( p - P) 2 

where the expectation in the last equation is understood to be taken over the 
collection corresponding to the mixed squares. Tractable analytic expressions 
for the probabilities 0 ^, a 2> and ot^ and the expected value E p|Q<p<-j( p - p) 2 
in terms of the stratum-field-size distribution and the crop proportion, p, for 
crop X will be derived first. 
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Assume that the stratum has area A and the crop X fields of length 1^ and w. 
have relative frequencies f . , i = 1 , 2, ••*, N. A typical field of crop X is 
displayed in figure B-l, where b is the expected "width" of a square falling 
on the field boundary (mixed square). It will be shown later that the average 
value of 2d cos 8 over 0 < 0 < it/4 gives a reasonable value for b. Since the 
model derived is for sampling units that are small relative to crop X field 
sizes, assume that b « 1^ and b « w^ for all i and the distance between 
any two fields of crop X is greater than or equal to b. 


To determine the probabilities a^, oig, and ay first note that the pure crop 
area and the mixed area associated with a field of length 1 . and width w^ are 
given, respectively, by 


Of - b)(w. - b) 


and 


(1 1 + b)(w. + b) - (1- - b)(w. - b) (B-l) 

Next note that the total number of fields of length 1. and w^ is given by 

f ’(vr) (B ' 2) 

From these equations and the definition of and a g, it follows that 


a l * A 


1— ^ 
-w i 

« — . -J 

pA\ 

-jdi - b)(w. -b) 

N 

(^ - b)(w i - b) 

Vi 

N /f 

iA\ 

i?(v 

— + b)(w^ + b) 


„ A 2bf i( w i + V 

= P 2j - — rr-i 

i=l w i i 
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and 


<*2 * '1 - a.| - a 3 (B-3) 

*!• 

~ 2 

To facilitate the evaluation of E p |o<P<l ^ * assume a S( 1 uare falling 
on a field boundary is configured as in figure S-2. The directed distance 
from the center of the square to the field boundary is denoted by x, where x 
is taken to be positive if the center of the square is not in the field, and 
x is taken to be negative if the center of the square is in the field. The 
smallest angle that a diagonal makes with the horizontal is denoted by 6. Now 
it is easy to see that |x| < d cos 9 and 0 < 9 < tt/4. 


The area of the square contained within the crop field can be expressed as a 
function of x and 9 for 0 < 9 < tt/4 and 0 < x < d cos 9 using simple geometric 
observations as follows. 


(d cos 9 - d sin 0)[tan(ir/4 - 0) + tan(ir/4 + 9 ) ] / c l — - os - 9 g — ( - ■ -- - x^ 


! 


A(9,X) 


Jfor 0 < x < d sin 9 

|l/2 (d cos 9 - x) 2 [tan(7r/4 - 0) + tan(7r/4 + 9)] 
for d sin 9 < x < d cos 0 


(B-4) 


This formula is readily extended to negative values of x and then adjusted for 
the total area of the square, Ag, to obtain the following expression for the 
proportion of the square contained within the crop field. 


A(9,x) 


» . for 0 < x < d cos 9 


p(9,x) = 


(B-5) 


1 _ , f 0r -d cos 9 < x < 0 


Observe that any angle 0 < 9 < tt/4 corresponds to two positions of the square: 
one where the angle is measured below the horizontal and the other where the 
angle is measured above the horizontal. Thus, it follows that the first and 
second moments of P, given 0 < P < 1, are obtained by the following. 
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CO fl) 



1 


r itf4 T d cos 0 

E P|0<P<1^ * V'J 2d "cos e J P(9.x)dx 

r\ J 


d cos 0 


de 


(B-6) 


E P(0<P<1^ P ^ * 4 ^ 


fir / 4 ( d cos 0 

J )Zd cos 6 J 
0 { -d cos 0 


[p(e*x)] z dx> do (B-7) 


The first Integral is readily evaluated as follows 

J) 


E P|0<P<1 (P) * J 2d cos 0 


r *l 4 

/ 1 

0 I'd cos 0 


/0 J J n 


d cos 0 


A( 0 «x) 


dx>d0 


-ir/4 -d cos 0 

4/ir y 2? cos ej 
0 0 


dxde 


= 1/2 


(B-8) 


Evaluation of the second integral is considerably more involved, requiring 
several steps. By using elemetry properties of integration and the definition 
of p(e,x), the second integral can be written as follows. 


e p|o<p<i^ p ) " 4/,7T j ] 


1 


2d cos 6 


( -d cos 9 

K dx 


u 


d cos 9 


.d cos 9 


A 0 0 


[A(9,x)rdx> d9 


A(9,x)dx 


(B-9) 


where 


f 


d cos 9 


A(9,x)dx * d’ i (cos 6 - sin 9)[tan(ir/4 -9) 

;ii 

+ tan(7r/4 + 9) ] (1/6 + 1/6 cos 9 sin 9} 
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and 


/ d cos 0 

[A(0, x)] 2 dx 

0 


n 

Y2 (cos 8 - sin 8r[tan(ir/4 - 6) 


+ tan(7r/4 + 0)] 2 (3 cos 2 0 sin 9 + sin^ 0) 


+ [tan(7r/4 - 0) + tan(Tr/4 + 0)] 2 (cos 0 - sin 8) 5 

Combining these last three equations and then simplfying reduces equation (B-7) 

for E / 0 2x to the following. 

P|0<P<l lK ' — — 


T 

.0 


E P|0<P<1 (P) * 4/7r W 8 - -k:\J cos' e(Fo7'fv sin e) + J EpsT^siS'e 


|.ir/4 

J 


(sin e) de 


+ 7 


’ -tt/4 

/* ir/4 , 3 , 

I (cos 6 + sin 9) de , 1 

1 r (sin J 9) de 

/ p + " 

J Q (cos 9 + sin 9) 

cos 9(cos 9 + sin 9) 2 


-ir/4 

+ (T j (cos 9 - sin er de 
5A^ Jq cos 0(cos 9 + sin 8) 2 


(B-10) 


Each of the integrals in equation (B-10) can be evaluated by making the sub- 
stitution 0 = Arctan x and then using partial fraction techniques. This 
yields 


: P|0<P<1 


(P*) • 4/ IT 


♦ 4 

5A^ 


■ /j 

[,/ 8 -|^[(ln2) + (,/8-M)^ 

7 |4 8 ' ?) (l»* - i - »/b) 

j\z - ir/2 - 

0 *- 


(B-ll) 
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Taking the sampling unit to be one unit square (Aq = 1 and d = y^/2 gives 
the approximation E p | o<P<l ^ “ 0.3682. Using this approximation for 
Epj 0 <p<i(P 2 ) and the expression derived earlier for E p|o<p<i( p ) yields the 

following approximation for Var(P). 


Var(P) = o^(l - p) 2 + o^p* + a 3 (0.3682 - p + p*) 


(B-12) 


Taking the width of the band of mixed squares on field boundaries to be the 
average "width" of a mixed square (fig. B-2) implies that 


b = 4 /tt 


r 


2d cos ede 


= 4dV2 

7T 

= 1.2732 


(B-13) 


This completes the formulas for the probabilities , a 2 , and ct 3 , and hence, 
the derivation of Var(P). 

In summary, for the derivation of Var(P), it has been assumed that the square 
did not fall on a field corner. This, of course, introduces a slight error. 
To estimate the magnitude of this error, first note that the probability of 
a square falling on a corner is given by 


N 4bf. 

«; - pE rv- 

4 i=l i w i 


(B-14) 


and the probability of a square falling on a field boundary and not on a cor- 
ner is given by 


a 3 = a 3 - a 4 


Hence, a more precise equation for Var(P) is 

Var(P) = c^O - p) 2 + a 2 p 2 + a 3 (0.3682 - p + p 2 ) + aJE c (P - 


(B-15) 


p) 2 ( B- 16) 
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where the expectation E is understood to be taken over the collection cor- 

V 

responding to the mixed squares that intersect a corner. 

p 

It would be very laborous to derive an analytic expression for E C (P - p) . 
However, if 0 is assumed to be tt/ 4 (the case when the sides of the field are 
parallel to the sides of the square), then it is easy to show that 

E C (P - p ) 2 > E C (P 2 ) - 2pE(P) + j S 2 

- 5 -l+p 2 (B-17) 

Hence, 

Var(P) * 0^(1 - p ) 2 + a 2 p 2 + 0 ( 3 ( 0.3682 - p + p 2 ) + - | + p 2 ) (B-18) 

For the field sizes and proportion p encountered in this study, equation (8-18) 
yields values that are within a few percentage points of the values obtained 
using equation (B-12) for Var(P) derived earlier. Table B-l gives the rela- 
tive change encountered using equation (B-18) for Var(P) for the selected 
proportions p and field sizes S in acres. 


TABLE B-l.- VARIANCE OF P FOR SOME COMBINATIONS OF p AND S 


s 

p, 0.01 

p, 0.10 

p, 0.20 

p, 0.30 

p,.0.40 

p, 0.50 

p, 0.60. 

Percent 

25 

5.0 

4.8 

3.7 

2.6 

1.0 

-1.7 

-7.3 

50 

2.5 

2.2 

1.8 

1.2 

0.5 

-0.7 

-2.9 

100 

1.2 

1.1 

0.9 

0.6 

0.2 

KB 

-1.3 
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